Search Results: "evgeni"

Let s assume you are a sysadmin and have to debug a daemon giving bad performance on one machine, but not on the other. Of course, you did not setup either machine, have only basic knowledge of the said daemon and would really love to watch that awesome piece of cinematographic art with a bunch of friends and a couple of beers. So it s like every day, right? The problem with understanding running setups is that you often have to read configuration files. And when reading one is not enough, you have to compare two or more of them. Suddenly, a wild problem occurs: order and indentation do not matter (unless they do), comments are often just beautiful noise and why the hell did that guy smoke/drink/eat while explicitly setting ALL THE OPTIONS to their defaults before actually setting them as he wanted. If you are using diff(1), you probably love to read a lot of differences, which are none in reality. Want an example? [foo]
bar = bar
foo = foo and # settings for foo
[foo]
# foo is best
foo = foo
# bar is ok here, FIXME?
bar = bar and [foo]
foo = x
bar = x
[foo]
foo = foo
bar = bar are actually the same, at least for some parsers. XTaran suggested using something like wdiff or dwdiff, which often helps, but not in the above case. Others suggested vimdiff, which is nice, but not really helpful here either. As there is a problem, and I love to solve these, I started a small new project: cfgdiff. It tries to parse two given files and give a diff of the content after normalizing it (merging duplicate keys, sorting keys, ignoring comments and blank lines, you name it). Currently it can parse various INI files, JSON, YAML and XML. That s probably not enough to be the single diff tool for configuration files, but it is quite a nice start. And you can extend it, of course ;)

When you run Puppet, it is very important to monitor whether all nodes have an uptodate catalog and did not miss the last year of changes because of a typo in a manifest or a broken cron-script. The most common solution to this is a script that checks /var/lib/puppet/state/last_run_summary.yaml on each node. While this is nice and easy in a small setup, it can get a bit messy in a bigger environment as you have to do an NRPE call for every node (or integrate the check as a local check into check_mk). Given a slightly bigger Puppet environment, I guess you already have PuppetDB running. Bonuspoints if you already let it save the reports of the nodes via reports = store,puppetdb. Given a central knowledgebase about your Puppet environment one could ask PuppetDB about the last node runs, right? I did not find any such script on the web, so I wrote my own: check_puppetdb_nodes. The script requires a recent (1.5) PuppetDB and a couple of Perl modules (JSON, LWP, Date::Parse, Nagios::Plugin) installed. When run, the script will contact the PuppetDB via HTTP on localhost:8080 (obviously configurable via -H and -p, HTTPS is available via -s) and ask for a list of nodes from the /nodes endpoint of the API. PuppetDB will answer with a list of all nodes, their catalog timestamps and whether the node is deactivated. Based on this result, check_puppetdb_nodes will check the last catalog run of all not deactivated nodes and issue a WARNING notification if there was none in the last 2 hours (-w) or a CRITICAL notification if there was none for 24 hours (-c). As a fresh catalog does not mean that the node was able to apply it, check_puppetdb_nodes will also query the /event-counts endpoint for each node and verify that the node did not report any failures in the last run (for this feature to work, you need reports stored in PuppetDB). You can modify the thresholds for the number of failures that trigger a WARNING/CRITICAL with -W and -C, but I think 1 is quite a reasonable default for a CRITICAL in this case. Using check_puppetdb_nodes you can monitor the health of ALL your Puppet nodes with a singe NRPE call. Or even with zero, if your monitoring host can access PuppetDB directly.

Dear all, we are proud to announce the 3.0 release of the Shogun Machine-Learning Toolbox. This release features the incredible projects of our 8 hard-working Google Summer of Code students. In addition, you get other cool new features as well as lots of internal improvements, bugfixes, and documentation improvements. To speak in numbers, we got more than 2000 commits changing almost 400000 lines in more than 7000 files and increased the number of unit tests from 50 to 600. This is the largest release that Shogun ever had! Please visit http://shogun-toolbox.org/ to obtain Shogun. News Here is a brief description of what is new, starting with the GSoC projects, which deserve most fame:

Gaussian Process classification by Roman Votjakov
Structured Output Learning of graph models by Shell Hu
Estimators for log-determinants of large sparse matrices by Soumyajit De
Feature Hashing and random kitchen sinks by Evangelos Anagnostopoulos
Independent Component Analysis by Kevin Hughes
A web-based demo framework by Liu Zhengyang
Metric learning with large margin nearest neighbours by Fernando Iglesias
Native support for various popular file formats by Evgeniy Andreev

Screenshots Everyone likes screenshots. Well, we have got something better! All of the above projects (and more) are now documented in the form of IPython notebooks, combining machine learning fundamentals, code, and plots. Those are a great looking way that we chose to document our framework from now on. Have a look at them and feel free to submit your use case as a notebook!

The web-demo framework has been integrated into our website, go check them out.

Other changes We finally moved to the Shogun build process to CMake. Through GSoC, added a general clone and equals methods to all Shogun objects, and added automagic unit-testing for serialisation and clone/equals for all classes. Other new features include multiclass LDA, and probability outputs for multiclass SVMs. For the full list, see the NEWS. Workshop Videos and slides In case you missed the first Shogun workshop that we organised in Berlin last July, all of the talks have been put online. Shogun in the Cloud As setting up the right environment for shogun and installing it was always one of the biggest problems for the users (hence the switching to CMake), we have created a sandbox where you can try out shogun on your own without installing shogun on your system! Basically it's a web-service which give you access to your own ipython notebook server with all the shogun notebooks. Of course you are more than welcome to create and share your own notebooks using this service! *NOTE*: This is a courtesy service created by Shogun Toolbox developers, hence if you like it please consider some form of donation to the project so that we can keep up this service running for you. Try shogun in the cloud. Thanks The release has been made possible by the hard work of all of our GSoC students, see list above. Thanks also to Thoralf Klein and Bj rn Esser for the load of great contributions. Last but not least, thanks to all the people who use Shogun and provide feedback. S ren Sonnenburg on behalf of the Shogun team (+ Viktor Gal, Sergey Lisitsyn, Heiko Strathmann and Fernando Iglesias)

Some days ago I got myself a new shiny Samsung 840 Pro 256GB SSD for my laptop. The old 80GB Intel was just too damn small. Instead of just doing a pvmove from the old to the new, I decided to set up the system from scratch. That is an awesome way to get rid of old and unused stuff or at least move it to some lower class storage (read: backup). One of the things I did not bother to copy from the old disk were my ~/Debian, ~/Grml and ~/Devel folders. I mean, hey, it s all in some kind of VCS, right? I can just clone it new, if I really want. Neither I copied much of my dotfiles, these are neatly gitted with the help of RichiH s awesome vcsh and a bit of human brains (no private keys on GitHub, yada yada). After cloning a couple of my personal repos from GitHub to ~/Devel, I realized I was doing a pretty dumb job, a machine could do for me. As I already was using Joey s mr for my vcsh repositories, generating a mr config and letting mr do the actual job was the most natural thing to do. So was using Python Requests and GitHub s JSON API. And here is Mister Hubert, aka mrhub: github.com/evgeni/MisterHubert. Just call it with your GitHub username and you get a nice mr config dumped to stdout. Same applies for organizations.

Authentication for private repos? (-p)
Other clone mechanisms? (-c)
A help function? (-h)
Other features?

As usual, I hope this is useful :)

Sorry Bryan,
I can show you plenty of hardware that is perfectly 64 bit capable but probably never will run Ubuntu and/or Unity. First, what is 64 bit for you? Looking at ubuntu.com/download and getting images from there, one gets the impression, that 64 bit is amd64 (also called x86_64). If one digs deeper to cdimage.ubuntu.com, one will find non-Intel images too: PowerPC and amrhf. As the PowerPC images are said to boot on G3 and G4 PowerPCs, these are 32 bit. Armhf is 32 bit too (arm64/aarch64 support in Linux is just evolving). So yes, if 64 bit means amd64, I do have hardware that can run Unity. But you asked if I have hardware that is 64 bit capable and can run Ubuntu/Unity, so may I apply my definiton of 64 bit here? I have an old Sun Netra T1-200 (500MHz UltraSPARC IIe) running Debian s sparc port, which has a 64 bit kernel and 32 bit userland. Unity? No wai. I do not own any ia64 or s390/s390x machines, but I am sure people do. And guess what, no Unity there either :) Sorry for ranting like this, but 64 bit really just means that the CPU can handle 64 bit big addresses etc. End even then, it not always will do so ;)

You may not know this, but I am a huge PowerDNS fan. This may be because it is so simple to use, supports different databases as backends or maybe just because I do not like BIND, pick one. I also happen to live in Germany where ISPs usually do not give static IP-addresses to private customers. Unless you pay extra or limit yourself to a bunch of providers that do good service but rely on old (DSL) technology, limiting you to some 16MBit/s down and 1MBit/s up. Luckily my ISP does not force the IP-address change, but it does happen from time to time (once in a couple of month usually). To access the machine(s) at home while on a non-IPv6-capable connection, I have been using my old (old, old, old) DynDNS.com account and pointing a CNAME from under die-welt.net to it. Some time ago, DynDNS.com started supporting AAAA records in their zones and I was happy: no need to type hostname.ipv6.kerker.die-welt.net to connect via v6 just let the application decide. Well, yes, almost. It s just DynDNS.com resets the AAAA record when you update the A record with ddclient and there is currently no IPv6 support in any of the DynDNS.com clients for Linux. So I end up with no AAAA record and am not as happy as I should be. Last Friday I got a mail from DynDNS:

Starting now, if you would like to maintain your free Dyn account, you must now log into your account once a month. Failure to do so will result in expiration and loss of your hostname. Note that using an update client will no longer suffice for this monthly login. You will still continue to get email alerts every 30 days if your email address is current.
Yes, thank you very much

Given that I have enough nameservers under my control and love hacking, I started writing an own dynamic DNS service. Actually you cannot call it a service. Or dynamic. But it s my own, and it does DNS: powerdyn. It is actually just a script, that can update DNS records in SQL (from which PowerDNS serves the zones). When you design such a service , you first think about user authentication and proper information transport. The machine that runs my PowerDNS database is reachable via SSH, so let s use SSH for that. You do not only get user authentication, server authentication and properly crypted data transport, you also do not have to try hard to find out the IP-address you want to update the hostname to, just use $SSH_CLIENT from your environment. If you expected further explanation what has to be done next: sorry, we re done. We have the user (or hostname) by looking at the SSH credentials, and we have the IP-address to update it to if the data in the database is outdated. The only thing missing is some execution daemon or cron(8). :) The machine at home has the following cron entry now:

*/5 * * * * ssh -4 -T -i /home/evgeni/.ssh/powerdyn_rsa powerdyn@ssh.die-welt.net

This connects to the machine with the database via v4 (my IPv6 address does not change) and that s all.
As an alternative, one can add the ssh call in /etc/network/if-up.d/, /etc/ppp/ip-up.d/ or /etc/ppp/ipv6-up.d (depending on your setup) to be executed every time the connection goes up. The machine with the database has the following authorized_keys entry for the powerdyn user:

no-agent-forwarding,no-port-forwarding,no-pty,no-X11-forwarding,no-user-rc,\ 
command="/home/powerdyn/powerdyn/powerdyn dorei.kerker.die-welt.net" ssh-rsa AAAA... evgeni@dorei

By forcing the command, the user has no way to get the database-credentials the script uses to write to the database and neither cannot update a different host. That seems secure enough for me. It won t scale for a setup as DynDNS.com and the user-management sucks (you even have to create the entries in the database first, the script can only update them), but it works fine for me and I bet it would for others too :) Update: included suggestions by XX and Helmut from the comments.

TL;DR: fqdn, "jabber.die-welt.net" . So, how many servers do you have, that are still running Squeeze? I count one, mostly because I did not figure out a proper upgrade path from OpenVZ to something else yet, but this is a different story. This post is about the upgrade of my communication machine, dengon.die-welt.net. It runs my private XMPP and IRC servers. I upgraded it to Wheezy, checked that my irssi and my BitlBee still could connect and left for work. There I noticed, that Pidgin could only connect to one of the two XMPP accounts I have on that server. sargentd@jabber.die-welt.net worked just fine, while evgeni@golov.de failed to connect. ejabberd was logging a failed authentication:

I(<0.1604.0>:ejabberd_c2s:802) : ( socket_state,tls, tlssock,#Port<0.5130>,#Port<0.5132> ,<0.1603.0> ) Failed authentication for evgeni@golov.de

While Pidgin was just throwing Not authorized errors. I checked the password in Pidgin (even if it did not change). I tried different (new) accounts: anything@jabber.die-welt.net worked, nothing@golov.de did not and somethingdifferent@jabber.<censored>.de worked too. So where was the difference between the three vhosts? jabber.die-welt.net and jabber.<censored>.de point directly (A/CNAME) to dengon.die-welt.net. golov.de has SRV records for XMPP pointing to jabber.die-welt.net. Let s ask Google about ejabberd pidgin srv . There are some bugs. But they are marked as fixed in Wheezy. Mhh Let s read again Okay, I have to set fqdn, "<my_srv_record_name>" . when this does not match my hostname. Edit /etc/ejabberd/ejabberd.cfg, add fqdn, "jabber.die-welt.net" . (do not forget the dot at the end) and restart the ejabberd. Pidgin can connect again. Yeah.

So you probably heard that I have that little new project of mine: QiFi the pure JavaScript WiFi QR Code Generator. It s been running pretty well and people even seem to like it. One of its (unannounced) features is a pretty clean stylesheet that is used for printing. When you print the result will be just the SSID and the QR code, so you can put that piece of paper everywhere you like. That works (I tested!) fine on Iceweasel/Firefox 10.0.12 and Chromium 25.0. Today I tried to do the same in Opera 12.14 and it failed terribly: the SSID was there, the QR code not. And here my journey begins First I suspected the CSS I used was fishy, so I kicked all the CSS involved and retried: still no QR code in the print-out. So maybe it s the QR code library I use that produces a weird canvas? Nope, the examples on http://diveintohtml5.info/canvas.html and http://devfiles.myopera.com/articles/649/example5.html don t print either. Uhm, let s Google for opera canvas print And oh boy I should not have done that. It seems it s a bug in Opera. And the proposed solution is to use canvas.toDataURL() to render the canvas as an image and load the image instead of the canvas. I almost went that way. But I felt that urge need to read the docs before. So I opened http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#dom-canvas-todataurl and https://developer.mozilla.org/en-US/docs/DOM/HTMLCanvasElement and started puking:

When trying to use types other than image/png , authors can check if the image was really returned in the requested format by checking to see if the returned string starts with one of the exact strings data:image/png, or data:image/png; . If it does, the image is PNG, and thus the requested type was not supported. (The one exception to this is if the canvas has either no height or no width, in which case the result might simply be data:, .)

If the type requested is not image/png, and the returned value starts with data:image/png, then the requested type is not supported.

Really? I have to check the returned STRING to know if there was an error? Go home HTML5, you re drunk! Okay, okay. No canvas rendered to images then. Let s just render the QR code as a <table> instead of a <canvas> when the browser looks like Opera. There is nothing one could do wrong with tables, right? But let s test with the basic example first:

Yes, this is 2013. Yes, this is Opera 12.14. Yes, the rendering of a fucking HTML table is wrong. Needles to say, Iceweasel and Chromium render the example just fine. I bet even a recent Internet Explorer would That said, there is no ~~bugfix~~workaround for Opera I want to implement. If you use Opera, I feel sorry for you. But that s all. Update: before someone cries ZOMG! BUG PLZ!!! , I filled this as DSK-383716 at Opera.

Some time ago, the QR Code Generator WiFi Access made quite some noise on the mighty Internet. Sure, it is cool to be able to share your WiFi-access with someone by just showing him a QR code he can scan on his phone and the phone will auto-connect to the WiFi. But I get a strange feeling telling someone I do not know my WiFi credentials. No, I do not mean my guests, I know them. I mean that shiny web-service that will generate a QR code for me. The geek in you will now say: So? Open up a terminal, install qrencode, pipe it the string WIFI:S:<SSID>;T:<WPA WEP >;P:<password>;; and you got our QR code . Yeah, that works. But was it one or two semicolons at the end? And was it really just WPA even if my WiFi uses WPA2? Oh and how do I encode that umlaut again? I do not want to remember this. Thus, without too much rumble, may I present you: QiFi the pure JS WiFi QR Code Generator. QiFi is a QR code generator for WiFi access in pure JavaScript. It will generate the QR code on your machine, in your browser, not leaking your precious credentials to anyone (but your guests). Don t trust me? Read the code. Fork the code. Host the code yourself. I hope you will find QiFi at least slightly useful ;-)

The summer came finally to an end and (yes in Berlin we still had 20 C end of October), unfortunately, so did GSoC with it. This has been the second time for SHOGUN to be in GSoC. For those unfamiliar with SHOGUN - it is a very versatile machine learning toolbox that enables unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. I again played the role of an org admin and co-mentor this year and would like to take the opportunity to summarize enhancements to the toolbox and my GSoC experience: In contrast to last year, we required code-contributions in the application phase of GSoC already, i.e., a (small) patch was mandatory for your application to be considered. This reduced the number of applications we received: 48 proposals from 38 students instead of 70 proposals from about 60 students last year but also increased the overall quality of the applications. In the end we were very happy to get 8 very talented students and have the opportunity of boosting the project thanks to their hard and awesome work. Thanks to google for sponsoring three more students compared to last GSoC. Still we gave one slot back to the pool for good to the octave project (They used it very wisely and octave will have a just-in-time compiler now, which will benefit us all!). SHOGUN 2.0.0 is the new release of the toolbox including of course all the new features that the students have implemented in their projects. On the one hand, modules that were already in SHOGUN have been extended or improved. For example, Jacob Walker has implemented Gaussian Processes (GPs) improving the usability of SHOGUN for regression problems. A framework for multiclass learning by Chiyuan Zhang including state-of-the-art methods in this area such as Error-Correcting Output Coding (ECOC) and ShareBoost, among others. In addition, Evgeniy Andreev has made very important improvements w.r.t. the accessibility of SHOGUN. Thanks to his work with SWIG director classes, now it is possible to use python for prototyping and make use of that code with the same flexibility as if it had been written in the C++ core of the project. On the other hand, completely new frameworks and other functionalities have been added to the project as well. This is the case of multitask learning and domain adaptation algorithms written by Sergey Lisitsyn and the kernel two-sample or dependence test by Heiko Strathmann. Viktor Gal has introduced latent SVMs to SHOGUN and, finally, two students have worked in the new structured output learning framework. Fernando Iglesias made the design of this framework introducing the structured output machines into SHOGUN while Michal Uricar has implemented several bundle methods to solve the optimization problem of the structured output SVM. It has been very fun and interesting how the work done in different projects has been put together very early, even during the GSoC period. Only to show an example of this dealing with the generic structured output framework and the improvements in the accessibility. It is possible to make use of the SWIG directors to implement the application specific mechanisms of a structured learning problem instance in python and then use the rest of the framework (written in C++) to solve this new problem. Students! You all did a great job and I am more than amazed what you all have achieved. Thank you very much and I hope some of you will stick around. Besides all these improvements it has been particularly challenging for me as org admin to scale the project. While I could still be deeply involved in each and every part of the project last GSoC, this was no longer possible this year. Learning to trust that your mentors are doing the job is something that didn't come easy to me. Having had about monthly all-hands meetings did help and so did monitoring the happiness of the students. I am glad that it all worked out nicely this year too. Again, I would like to mention that SHOGUN improved a lot code-base/code-quality wise. Students gave very constructive feedback about our (lack) of proper Vector/Matrix/String/Sparse Matrix types. We now have all these implemented doing automagic memory garbage collection behind scenes. We have started to transition to use Eigen3 as our matrix library of choice, which made quite a number of algorithms much easier to implement. We generalized the Label framework (CLabels) to be tractable for not just classification and regression but multitask and structured output learning. Finally, we have had quite a number of infrastructure improvements. Thanks to GSoC money we have a dedicated server for running the buildbot/buildslaves and website. The ML Group at TU Berlin does sponsor virtual machines for building SHOGUN on Debian and Cygwin. Viktor Gal stepped up providing buildslaves for Ubuntu and FreeBSD. Gunnar Raetschs group is supporting redhat based build tests. We have Travis CI running testing pull requests for breakage even before merges. Code quality is now monitored utilizing LLVMs scan-build. Bernard Hernandez appeared and wrote a fancy new website for SHOGUN. A more detailed description of the achievements of each of the students follows:

Kernel Two-sample/Dependence test
- Student: Heiko Strathmann
- Mentor: Arthur Gretton, Soeren Sonnenburg
Heiko Strathmann, mentored by Arthur Gretton, worked on a framework for kernel-based statistical hypothesis testing. Statistical tests to determine whether two random variables are are equal/different or are statistically independent are an important tool in data-analysis. However, when data are high-dimensional or in non-numerical form (documents, graphs), classical methods fail. Heiko implemented recently developed kernel-based generalisations of classical tests which overcome these issues by representing distributions in high dimensional so-called reproducing kernel Hilbert spaces. By doing so, theoretically any pair samples can be distinguished. Implemented methods include two-sample testing with the Maximum Mean Discrepancy (MMD) and independence testing using the Hilbert Schmidt Independence Criterion (HSIC). Both methods come in different flavours regarding computational costs and test constructions. For two-sample testing with the MMD, a linear time streaming version is included that can handle arbitrary amounts of data. All methods are integrated into a newly written flexible framework for statistical testing which will be extended in the future. A book-style tutorial with descriptions of algorithms and instructions how to use them is also included.
Implement multitask and domain adaptation algorithms
- Student: Sergey Lisityn
- Mentor: Christian Widmer
Like Heiko, Sergey Lisitsyn did participate in the GSoC programme for the second time. This year he focused on implementing multitask learning algorithms. Multitask learning is a modern approach to machine learning that learns a problem together with other related problems at the same time using a shared representation. This approach often leads to a better model for the main task, because it allows the learner to use the commonality among the tasks. During the summer Sergey has ported a few algorithms from the SLEP and MALSAR libraries with further extensions and improvements. Namely, L12 group tree and L1q group multitask logistic regression and least squares regression, trace-norm multitask logistic regression, clustered multitask logistic regression, basic group and group tree lasso logistic regression. All the implemented algorithms use COFFIN framework for flexible and efficient learning and some of the algorithms were implemented efficiently utilizing the Eigen3 library.
Implementation of / Integration via existing GPL code of latent SVMs.
- Student: Viktor Gal
- Mentor: Alexander Binder
A generic latent SVM and additionally a latent structured output SVM has been implemented. This machine learning algorithm is widely used in computer vision, namely in object detection. Other useful application fields are: motif finding in DNA sequences, noun phrase coreference, i.e. provide a clustering of the nouns such that each cluster refers to a single object. It is based on defining a general latent feature Psi(x,y,h) depending on input variable x, output variable y and latent variable h. Deriving a class from the base class allows the user to implement additional structural knowledge for efficient maximization of the latent variable or alternative ways of computation or on-the-fly loading of latent features Psi as a function of the input, output and latent variables.
Bundle method solver for structured output learning
- Student: Michal Uricar
- Mentor: Vojtech Franc
We have implemented two generic solvers for supervised learning of structured output (SO) classifiers. First, we implemented the current state-of-the-art Bundle Method for Regularized Risk Minimization (BMRM) [Teo et al. 2010]. Second, we implemented a novel variant of the classical Bundle Method (BM) [Lamarechal 1978] which achieves the same precise solution as the BMRM but in time up to an order of magnitude shorter [Uricar et al. 2012]. Among the main practical benefits of the implemented solvers belong their modularity and proven convergence guarantees. For training a particular SO classifier it suffices to provide the solver with a routine evaluating the application specific risk function. This feature is invaluable for designers who can concentrate on tuning the classification model instead of spending time on developing new optimization solvers. The convergence guarantees remove the uncertainty inherent in use of on-line approximate solvers. The implemented solvers have been integrated to the structured output framework of the SHOGUN toolbox and they have been tested on real-life data.
Built generic structured output learning framework
- Student: Fernando Jose Iglesias Garcia
- Mentor: Nico Goernitz
During GSoC 2012 Fernando implemented a generic framework for structured output (SO) problems. Structured output learning deals with problems where the prediction made is represented by an object with complex structure, e.g. a graph, a tree or a sequence. SHOGUN's SO framework is flexible and easy to extend [1]. Fernando implemented a na ve cutting plane algorithm for the SVM approach to SO [2]. In addition, he coded a case of use of the framework for labelled sequence learning, the so-called Hidden Markov SVM. The HM-SVM can be applied to solve problems in different fields such as gene prediction in bioinformatics or speech to text in pattern recognition. [1] Class diagram of the SO framework.
[2] Support Vector Machine Learning for Interdependent and Structured Output Spaces.
Improving accessibility to shogun
- Student: Evgeniy Andreev
- Mentor: Soeren Sonnenburg
During the latest google summer of code Evgeniy has improved the Python modular interface. He has added new SWIG-based feature - director classes, enabling users to extend SHOGUN classes with python code and made SHOGUN python 3 ready. Evgeniy has also added python's protocols for most usable arrays (like vectors, matrices, features) which makes possible to work with Shogun data structures just like with numpy arrays with no copy at all. For example one can now modify SHOGUN's RealFeatures in place or use.
Implement Gaussian Processes and regression techniques
- Student: Jacob Walker
- Mentor: Oliver Stegle
Jacob implemented Gaussian Process Regression in the Shogun Machine Learning Toolbox. He wrote a complete implementation of basic GPR as well as approximation methods such as the FITC (First Independent Training Conditional) method and the Laplacian approximation method. Users can utilize this feature to analyze large scale data in a variety of fields. Scientists can also build on this implementation to conduct research extending the capability and applicability of Gaussian Process Regression.
Build generic multiclass learning framework
- Student: Chiyuan Zhang
- Mentor: Cheng Soon Ong
Chiyuan defined a new multiclass learning framework within SHOGUN. He re-organized previous multiclass learning components and refactored the CMachine hierarchy in SHOGUN. Then he generalized the existing one-vs-one and one-vs-rest multiclass learning scheme to general ECOC encoding and decoding strategies. Beside this, several specific multiclass learning algorithms are added, including ShareBoost with feature selection ability, Conditional Probability Tree with online learning ability, and Relaxed Tree.

Do you deliver your mail with maildrop? If not, this post is only for your amusement . My mailserver runs Postfix as MTA and maildrop as MDA, a pretty common setup I d say. And it happens that maildrop supports quota. It supports it so good, that I have no idea how to disable that support, but I also actually never cared, as my user database declares each user has 10GB quota for mails (courier s authtest says Quota: 10000000000S , so does the configuration). And 10GB should be enough for everybody, right? Well, so I thought until I noticed that my Icedove indicated a 99% full mailbox and shortly afterwards maildrop stopped delivering mails with maildir over quota . Looking at the maildirsize file in my maildir, I noticed that the quota is set to 1410065408S, a mere 1.4GB. Where does this number come from? The proficient reader will quickly see that 10000000000 mod 2^32 = 1410065408, so this is actually an integer overflow happening somewhere in the code handling the maildirsize file (read: in maildrop). A short dig through the Debian BTS revealed a bug from 2003, saying exactly the same. The bug also indicated, the issue is fixed since maildrop 2.5. A short cowbuilder run later, I had a maildrop_2.5.5-2_i386.deb, installed it and after the next mail delivery, my quota was at 10GB as it should. TL;DR: If you run into strange maildir over quota errors with maildrop on Debian Squeeze, get a newer maildrop (or backport that single patch to Squeeze s maildrop).

This especially goes to planet.debian.org: SORRY! My WordPress thought it is a great idea to deliver empty (no date, no link, no content) posts, randomly, and planet started to post everything as new as it took the feed. I still haven t reenabled all the plugins, but it runs stable for several hours now and I ll try not to break it again.

good news: I'm seeing more & more people contributing to RC bugs in the BTS. here are my own contributions for the past week:

#656552 zope2.12-sandbox: "zope2.12-sandbox: fails to install, remove and install again"
add info to bug report
#670405 ekiga: "ekiga: Ekiga crashes on startup"
add info to bug report
~~#673370~~ centerim: "centerim: Crash at startup (suspect login msn)"
add info to bug report, then close on submitter's behalf
#683061 ntp: "ntp: missing init script dependency on $named"
sponsor NMU from Ivo De Decker (fix init script), upload to DELAYED/2, then cancelled by maintainer after severity change
~~#684019~~ python-liblcms: "python-liblcms: static library in /usr/share"
re-enable dropping of .a and .la files from python package, upload to DELAYED/2
~~#684024~~ src:python-adodb: "python-adodb: broken binary-indep target"
apply patches from Evgeni Golov (d/rules targets) and Jakub Wilk (python-support handling), upload to DELAYED/2
~~#684078~~ wv2: "calligra: Buffer overflow"
apply modified patch from KDE, upload to DELAYED/2
#684150 python-wxgtk2.8: "python-wxgtk2.8 breaks upgrade from squeeze to wheezy"
add info to bug report
~~#684180~~ angband: "angband: removes files that were installed by another package: /usr/share/angband/*"
don't remove directory belonging to other package in postrm, upload to DELAYED/2
~~#684488~~ puppet-lint: "puppet-lint not working with ruby1.9.1"
send a patch to the BTS

Search Results: "evgeni"

16 February 2014

Evgeni Golov: diffing configuration files made easy

3 February 2014

Evgeni Golov: Monitoring your Puppet nodes using PuppetDB

29 October 2013

Soeren Sonnenburg: Shogun Toolbox Version 3.0 released!

25 July 2013

Evgeni Golov: Say hello to Mister Hubert!

17 June 2013

Evgeni Golov: Running Debian without Unity on a machine that is 64 bit capable!

19 May 2013

Evgeni Golov: powerdyn a dynamic DNS service for PowerDNS users

7 May 2013

Evgeni Golov: Wheezy, ejabberd, Pidgin and SRV records

30 March 2013

Evgeni Golov: Opera, standards and why I should have stayed in my cave

20 March 2013

Evgeni Golov: QiFi the pure JS WiFi QR Code Generator

27 October 2012

Soeren Sonnenburg: Shogun at Google Summer of Code 2012

23 September 2012

Evgeni Golov: 1410065408S

18 September 2012

Evgeni Golov: sorry for the spam

Evgeni Golov: the fairy tale of the UNIVERSAL serial bus

Evgeni Golov: Looking for new NAS hardware

Evgeni Golov: I am the coolest Debian fanboy

Evgeni Golov: RC bugs 2012/27 and 2012/28

Evgeni Golov: Desktop in a Shell: irssi with nicklist support and away nicks

Evgeni Golov: Debian at FrOSCon 2012

Evgeni Golov: Why I hope Twitter will die with the new API

26 August 2012

Gregor Herrmann: RC bugs 2012/34